The widespread use of face masks during and after the COVID-19 pandemic has significantly weakened traditional facial emotion recognition systems that depend on full facial visibility. Existing RGB-based models fail when the lower face — containing the mouth and nose — is covered. This paper proposes a robust system for human emotion classification using facial thermal infrared features that remain detectable even in the presence of a mask. Thermal cameras capture involuntary skin temperature changes driven by the autonomic nervous system\'s response to emotional states. Region-specific thermal features are extracted from visible facial zones — forehead, periorbital areas, nose bridge, and cheekbones — and fed into a fine-tuned ResNet-50 deep learning model. The proposed system classifies six primary emotions: happiness, sadness, anger, fear, disgust, and surprise, achieving an overall accuracy of 87.4% on masked-face test data — outperforming conventional RGB methods by over 26 percentage points. This framework provides a practical, mask-robust solution for affective computing in healthcare, intelligent surveillance, and human-robot interaction.
Introduction
The paper presents a thermal-based facial emotion recognition system designed for masked individuals, addressing the major limitation of traditional RGB-based models that fail when the lower face is occluded (e.g., due to COVID-19 masks).
Key Idea
Since masks block visible facial cues like the mouth, the system uses thermal infrared imaging, which captures temperature variations caused by emotions (e.g., around the eyes, forehead, and nose) that remain unaffected by masks.
Contributions
A complete pipeline for masked emotion recognition using thermal images
A custom dataset of 7,200 thermal images from 60 masked subjects
A hybrid model combining ResNet-50 deep features + handcrafted thermal features
Trained using TensorFlow/Keras with early stopping
Results
Overall accuracy: 87.4%, macro F1-score: 0.84
Best performance: Happiness (F1 = 0.94)
Lower performance: Fear and Disgust due to similar thermal patterns
Outperforms:
RGB CNN models (significant gain under masks)
Previous thermal methods (+12.8% improvement)
Conclusion
This paper proposed a novel framework for human emotion classification using facial thermal features in the presence of face masks. By leveraging involuntary skin temperature patterns in visible upper-face regions, the system overcomes the key limitation of RGB approaches under occlusion. The modified ResNet-50 achieved 87.4% accuracy — highest reported for full-mask occlusion — with a modular, reproducible pipeline from acquisition to real-time inference.
Future work: (i) multi-spectral sensor fusion with near-infrared cameras; (ii) knowledge distillation for edge deployment; (iii) dataset expansion across age groups, ethnicities, mask types (N95, cloth, transparent); (iv) study of mask material thermal transmittance effects on accuracy.
References
[1] P. Ekman and W. V. Friesen, Facial Action Coding System. Consulting Psychologists Press, 1978.
[2] P. Viola and M. Jones, \"Rapid object detection using a boosted cascade,\" IEEE CVPR, vol. 1, pp. 511–518, 2001.
[3] K. Simonyan and A. Zisserman, \"Very deep convolutional networks,\" arXiv:1409.1556, 2015.
[4] K. He, X. Zhang, S. Ren, J. Sun, \"Deep residual learning for image recognition,\" IEEE CVPR, pp. 770–778, 2016.
[5] Y. Yoshitomi et al., \"Effect of lighting on facial expression recognition,\" IEEE AVSS, pp. 229–233, 1997.
[6] B. R. Nhan and T. Chau, \"Classifying affective states using thermal infrared imaging,\" IEEE Trans. Biomed. Eng., vol. 57, no. 4, pp. 979–987, 2010.
[7] L. Trujillo et al., \"Automatic feature localization in thermal images,\" IEEE CVPR Workshops, pp. 14–14, 2005.
[8] Y. Li and J. Zeng, \"Masked face recognition with RGB-depth sensors,\" Pattern Recognit. Lett., vol. 138, pp. 327–332, 2020.
[9] O. Arriaga et al., \"Real-time CNNs for emotion and gender classification,\" arXiv:1710.07557, 2017.
[10] ] I. J. Goodfellow et al., \"Challenges in representation learning,\" ICONIP, pp. 117–124, 2013.